13 research outputs found

    Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data

    Get PDF
    How do humans move around in the urban space and how do they differ when the city undergoes terrorist attacks? How do users behave in Massive Open Online courses~(MOOCs) and how do they differ if some of them achieve certificates while some of them not? What areas in the court elite players, such as Stephen Curry, LeBron James, like to make their shots in the course of the game? How can we uncover the hidden habits that govern our online purchases? Are there unspoken agendas in how different states pass legislation of certain kinds? At the heart of these seemingly unconnected puzzles is this same mystery of multi-aspect mining, i.g., how can we mine and interpret the hidden pattern from a dataset that simultaneously reveals the associations, or changes of the associations, among various aspects of the data (e.g., a shot could be described with three aspects, player, time of the game, and area in the court)? Solving this problem could open gates to a deep understanding of underlying mechanisms for many real-world phenomena. While much of the research in multi-aspect mining contribute broad scope of innovations in the mining part, interpretation of patterns from the perspective of users (or domain experts) is often overlooked. Questions like what do they require for patterns, how good are the patterns, or how to read them, have barely been addressed. Without efficient and effective ways of involving users in the process of multi-aspect mining, the results are likely to lead to something difficult for them to comprehend. This dissertation proposes the M^3 framework, which consists of multiplex pattern discovery, multifaceted pattern evaluation, and multipurpose pattern presentation, to tackle the challenges of multi-aspect pattern discovery. Based on this framework, we develop algorithms, applications, and analytic systems to enable interpretable pattern discovery from multi-aspect data. Following the concept of meaningful multiplex pattern discovery, we propose PairFac to close the gap between human information needs and naive mining optimization. We demonstrate its effectiveness in the context of impact discovery in the aftermath of urban disasters. We develop iDisc to target the crossing of multiplex pattern discovery with multifaceted pattern evaluation. iDisc meets the specific information need in understanding multi-level, contrastive behavior patterns. As an example, we use iDisc to predict student performance outcomes in Massive Open Online Courses given users' latent behaviors. FacIt is an interactive visual analytic system that sits at the intersection of all three components and enables for interpretable, fine-tunable, and scrutinizable pattern discovery from multi-aspect data. We demonstrate each work's significance and implications in its respective problem context. As a whole, this series of studies is an effort to instantiate the M^3 framework and push the field of multi-aspect mining towards a more human-centric process in real-world applications

    Twitter in Academic Conferences: Usage, Networking and Participation over Time

    Full text link
    Twitter is often referred to as a backchannel for conferences. While the main conference takes place in a physical setting, attendees and virtual attendees socialize, introduce new ideas or broadcast information by microblogging on Twitter. In this paper we analyze the scholars' Twitter use in 16 Computer Science conferences over a timespan of five years. Our primary finding is that over the years there are increasing differences with respect to conversation use and information use in Twitter. We studied the interaction network between users to understand whether assumptions about the structure of the conversations hold over time and between different types of interactions, such as retweets, replies, and mentions. While `people come and people go', we want to understand what keeps people stay with the conference on Twitter. By casting the problem to a classification task, we find different factors that contribute to the continuing participation of users to the online Twitter conference activity. These results have implications for research communities to implement strategies for continuous and active participation among members

    Tweeting Questions in Academic Conferences: Seeking or Promoting Information?

    Get PDF
    The fast growth of social media has reshaped the traditional way of human interaction and information seeking behavior, which draws research attention on characterizing the new information seeking paradigm. However, results from previous studies might not be well grounded under certain social settings. In this paper, we leverage machine learning techniques to identify different types of question tweets within academic communities as an example of one particular social context. By studying over 160 thousands of tweets posted by 30 academic communities, we discovered a different landscape of information-seeking behaviors, where less tweets are regarded as question tweets, and more real information-seeking tweets are observed. We also found that users respond differently with different types of question tweets. We believe our study would be beneficial for understanding the information seeking behaviors in social media.ye

    Information Seeking in Academic Conferences

    No full text
    The data sets released here has been used in our a study on longitudinal information seeking and social networking behaviors across academic communities. Social media like Twitter have been widely used in physical gatherings, such as conferences and sports events, as a "backchannel" to facilitate the conversations among participants. It has remained largely unexplored though, how event participants seek information in those situations. There are three key results: (1) Our study takes the first initiative to characterize the information seeking and responding networks in a concrete context---academic conferences---as one example of physical gatherings. By studying over 190 thousand tweets posted by 66 academic communities over five years, we unveil the landscape of information-seeking activities and the associated social and temporal contexts during the conferences. (2) We leverage crowdsourcing and machine learning techniques to identify distinct types of information-seeking tweets in academic communities. We show that the information needs can be differentiated by their posted time and content, as well as how they were responded to. Interestingly, users' tendencies of posting certain types of information needs can be inferred by prior tweeting activities and network positions. (3) Moreover, our results suggest it is also possible to predict the potential respondents to different types of information needs. Our study was based on two data sets: (1) a long-term collection of tweets posted by 66 academic communities over five years, and (2) a subset of information-seeking tweets with human annotated labels (the types of questions). We are making the data sets available for academic researchers and public use, to enable the discovery of new insights and development of better techniques to facilitate information seeking. Dataset (1): The conference tweets are collected through keywords search using Topsy API in 2014. The keywords vary for each conference and each year, but typically include two parts in the text and follow the format of "Conference Acronym"+"Year". For example, the International World Wide Web Conference in the year of 2013 would have the hashtag as "www2013". Duration: 2008 to 2013 Total number of tweets: 334,507 Dataset (2): We further identify the information seeking tweets by checking whether the tweet contains the question mark (?) in its text. We then design the information seeking question categorization and develop the code book to help human subjects identify the question type. The human annotations are obtained from Amazon Mechanical Turk. Based on the human annotations, we train machine classifiers to identify the question types for the rest of information seeking tweets. Duration: 2008 to 2013 Total number of labeled information seeking tweets: 1,899 Total number of unlabeled information seeking tweets: 9,967 Publication: If you make use of this data set, please cite: Wen, X., & Lin, Y. R. (2015, November). Information Seeking and Responding Networks in Physical Gatherings: A Case Study of Academic Conferences in Twitter. In Proceedings of the 2015 ACM on Conference on Online Social Networks (pp. 197-208). ACM

    Geo-tagged Tweets in Paris during Nov 2015

    No full text
    <p><strong>Abstract</strong></p> <p>The data sets released here have been used in our study on quantitatively evaluating the impact of disasters in the city. The study of disaster events and their impact in the urban space has been traditionally conducted through manual collections and analysis of surveys, questionnaires and authority documents. While there have been increasingly rich troves of human behavioral data related to the events of interest, the ability to obtain hindsight following a disaster event has not been scaled up. In this study, we propose a novel approach for analyzing events called PairFac. PairFac utilizes discriminant tensor analysis to automatically discover the impact of a major event from rich human behavioral data. Our method aims to (i) uncover the persistent patterns across multiple interrelated aspects of urban behavior (e.g., when, where and what citizens do in a city) and at the same time (ii) identify the salient changes following a potentially impactful event. We show the effectiveness of PairFac in comparison with previous methods through extensive experiments. We also demonstrate the advantages of our approach through case studies with real-world traffic sensor data and social media streams surrounding the 2015 terrorist attacks in Paris. Our work has both methodological contributions in studying the impact of an external stimulus on a system as well as practical implications in the area of disaster event analysis and assessment.</p> <p><strong>Dataset</strong></p> <p>There are two datasets used in this study, traffic sensor dataset and social media dataset.   </p> <p>Traffic Sensor dataset was collected from open data Paris. This dataset includes all the hourly data for the flow and the occupancy rate assembled by the permanent traffic sensors installed on the Paris City network (urban network and peripheral boulevard). For interested readers, please direct to the link as: https://opendata.paris.fr/explore/dataset/comptages-routiers-permanents/</p> <p>Social media dataset was collected from Twitter API. The dataset contains geo-tagged tweets from Paris collected through Twitter API between the period of Oct 16th, 2015 and Nov 20, 2015. 75,982 geo-located tweets were extracted during the period covered.</p> <p>Duration: 2015-10-16 to 2015-11-20.</p> <p>Total number of tweets: 75,982</p> <p><strong>Publication</strong></p> <p>If you make use of this data set, please kindly cite:</p> <p>Xidao Wen, Yu-Ru Lin, and Konstantinos Pelechrinis. 2016. PairFac: Event Analytics through Discriminant Tensor Factorization. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM '16). ACM, New York, NY, USA, 519-528. DOI: https://doi.org/10.1145/2983323.2983837</p> <p> </p

    Voila: Visual Anomaly Detection and Monitoring with Streaming Spatiotemporal Data

    No full text
    The increasing availability of spatiotemporal data continuously collected from various sources provides new opportunities for a timely understanding of the data in their spatial and temporal context. Finding abnormal patterns in such data poses significant challenges. Given that there is often no clear boundary between normal and abnormal patterns, existing solutions are limited in their capacity of identifying anomalies in large, dynamic and heterogeneous data, interpreting anomalies in their multifaceted, spatiotemporal context, and allowing users to provide feedback in the analysis loop. In this work, we introduce a unified visual interactive system and framework, Voila, for interactively detecting anomalies in spatiotemporal data collected from a streaming data source. The system is designed to meet two requirements in real-world applications, i.e., online monitoring and interactivity. We propose a novel tensor-based anomaly analysis algorithm with visualization and interaction design that dynamically produces contextualized, interpretable data summaries and allows for interactively ranking anomalous patterns based on user input. Using the “smart city” as an example scenario, we demonstrate the effectiveness of the proposed framework through quantitative evaluation and qualitative case studies
    corecore